Saving cost with AWS Lambda Recursion Control !!!
AWS lambda has a new feature of recursion control which stops lambda functions execution during infinite or recursive loops.
There is an amazing blog on this from AWS team which test this feature with AWS SQS, Java lambda function and AWS SAM. Please check it out for more details.
However I took a different approach to test this feature using CDK, SNS topic, and typescript lambda function.
Prerequisites
- Understanding how aws lambda is invoked and works.
- Understanding of event-driven architecture which involves an event source and destination where lambda outputs.
Let's test it first
- To test this feature we will be setting up sns topic, lambda (which publish message back to source sns topic) this creates a recursive loop.
- The infrastructure for this test will be provisioned using AWS CDK.
-
Github repo for CDK code. In order to test this code, please enter your email address in the cdk code.
// to deploy cdk deploy --all // to destroy cdk destroy --all
Architecture Diagram
What is a Recursive loop in Lambda function?
- AWS services (source) generate events that invoke Lambda functions, and Lambda functions then send messages to other AWS Services (destination).
- Usually in event-driven architectures source and destination cannot be the same.
- Due to misconfiguration or coding bugs, the lambda function sends the processed event to the same AWS Service (source) that invokes the Lambda function, causing a recursive loop.
-
When this recursive loop occurs the lambda function is also executed for infinite times as long as the loop occurs thus adding up to the usage bill for the aws account.
What is a Recursive control in Lambda?
- According to this new feature[aws docs] it will detect and automatically break this loop or say stop lambda execution if it is invoked for more than 16+ times.
- Behind the scenes, Recursion control uses X-Ray. Lambda uses AWS X-Ray tracing headers. When AWS services that support recursive loop detection send events to Lambda, those events are automatically annotated with metadata.
- The updated metadata includes a count of the number of times that the event has invoked the function. That is how lambda detect recursion loop.
- You don't need to enable X-Ray active tracing for this feature to work.
Note: Some important points about this feature
- Same chain of requests: A chain of requests is a sequence of Lambda invocations caused by the same triggering event. for example
SNS → Lambda → SNS = recursive loop
- If your function is invoked more than 16 times in the same chain of requests, then Lambda automatically stops the next function invocation in that request chain and notifies you.
- Lambda stops only invocations that are part of the same chain of requests. For example, if your function is configured with multiple triggers, then invocations from other triggers aren't affected.
- This feature is free to use.
Supported AWS Services and SDK
AWS Services
- At this moment lambda recursion control is supported between AWS SQS and AWS SNS event.
AWS SDKs
Node.js 2.1147.0 (SDK version 2) 3.105.0 (SDK version 3) Python 1.24.46 (boto3) 1.27.46 (botocore) Java 8 and Java 11 1.12.200 (SDK version 1) 2.17.135 (SDK version 2) Java 17 2.20.81 .NET 3.7.293.0 Ruby 3.134.0
Types of notifications for Recursive loop
- Email - An single email (once in 24 hours) will be sent to AWS account's primary account contact and alternate operations contact.
- I am using AWS organisations and iam role so it wasn't sent to my account email adress, it was my org account manager address.
- AWS Health Dashboard- This will be inside your aws account. A notification under the
other notifications
section with details and link to lambda function.
Note: For both email and health dashboard, notifications will take around 3 hours to arrive.
- Cloudwatch alarms - In order to set the alarm we need to check the
RecursiveInvocationsDropped
metric. We can always set an alarm if this metric count is more than 0.- I haven't set up alarms personally for this test, so feel free to post about this in the comments. Would love to hear.
How to fix this recursion loop?
- Remove or disable the trigger which invokes lambda function. In our case sns topic trigger.
topic.addSubscription(new aws_sns_subscriptions.LambdaSubscription(hello)) //remove this trigger
- Identify and fix code which may be cause the loop. in our case I am sending the message from lamba to the same sns topic which is the source.
import { SNSClient, PublishCommand } from "@aws-sdk/client-sns"; export async function main() { const client = new SNSClient({ region: process.env.AWS_Region }); try { const input = { // PublishInput TopicArn: process.env.TOPIC_ARN, // fix this Message: "This is a message from lambda", }; const command = new PublishCommand(input); const response = await client.send(command); console.log(response); } catch (error) { console.log(error) } }
- Setting concurrency to 0. It basically acts like disabling lambda or like how aws says off switch for lambda.
-
If sqs is the source set up DLQ.
- If sns is the source set up on failure destination for lambda.
How to turn off this feature?
By default this feature is turned on for supported AWS Services and sdk's. If your usecases uses recursive patterns, then you can request to turn off Lambda recursive loop detection. To request this change, contact AWS Support.
Conclusion
-
Recursion loop is not so often scenario but if its happens can cost a lot of money for your account and it can happen anyway due to misconfiguration or ignorance when writing lambda code.
-
Right now this feature is not supported for other services like Dynamodb and s3 events. In case of dynamodb and s3, the only way to detect recursive loop is using cloudwatch alarms. Please make a note of this.
-
I really hope in the future this feature is supported in DevOps Guru so that is very easy to find out the cause of this loop.